10 research outputs found

    QuantPipe: Applying Adaptive Post-Training Quantization for Distributed Transformer Pipelines in Dynamic Edge Environments

    Full text link
    Pipeline parallelism has achieved great success in deploying large-scale transformer models in cloud environments, but has received less attention in edge environments. Unlike in cloud scenarios with high-speed and stable network interconnects, dynamic bandwidth in edge systems can degrade distributed pipeline performance. We address this issue with QuantPipe, a communication-efficient distributed edge system that introduces post-training quantization (PTQ) to compress the communicated tensors. QuantPipe uses adaptive PTQ to change bitwidths in response to bandwidth dynamics, maintaining transformer pipeline performance while incurring limited inference accuracy loss. We further improve the accuracy with a directed-search analytical clipping for integer quantization method (DS-ACIQ), which bridges the gap between estimated and real data distributions. Experimental results show that QuantPipe adapts to dynamic bandwidth to maintain pipeline performance while achieving a practical model accuracy using a wide range of quantization bitwidths, e.g., improving accuracy under 2-bit quantization by 15.85\% on ImageNet compared to naive quantization

    Proteus:Network-aware Web Browsing on Heterogeneous Mobile Systems

    Get PDF
    We present Proteus, a novel network-aware approach for optimizing web browsing on heterogeneous multi-core mobile systems. It employs machine learning techniques to predict which of the heterogeneous cores to use to render a given webpage and the operating frequencies of the processors. It achieves this by first learning offline a set of predictive models for a range of typical networking environments. A learnt model is then chosen at runtime to predict the optimal processor configuration, based on the web content, the network status and the optimization goal. We evaluate Proteus by implementing it into the open-source Chromium browser and testing it on two representative ARM big.LITTLE mobile multi-core platforms. We apply Proteus to the top 1,000 popular websites across seven typical network environments. Proteus achieves over 80% of best available performance. It obtains, on average, over 17% (up to 63%), 31% (up to 88%), and 30% (up to 91%) improvement respectively for load time, energy consumption and the energy delay product, when compared to two state-of-the-art approaches

    Distributed Edge Machine Learning Pipeline Scheduling with Reverse Auctions

    No full text
    Scheduling distributed machine learning pipelines in edge environments is a growing area of research as developers work to bring large, high-accuracy models to relatively low-powered devices. Edge environment dynamics, such as device availability and connectivity, make distributed scheduling a more challenging problem than in traditional cloud environments. Existing approaches usually require significant a priori knowledge of the environment and make assumptions about model availability, both of which are impractical in real edge deployments. We address this problem by proposing a simple and efficient reverse auction algorithm, where a device that wants to distribute a large machine learning workload requests bids from available resources in the environment to construct connected pipelines. We implement our reverse auction scheduling on an existing distributed machine learning pipeline framework and perform an empirical evaluation using a real distributed edge computing testbed. We prove that scheduling distributed pipelines without repeating devices is an NP-complete problem, but that finding good latency or throughput pipelines is tractable for fixed device orderings. Abstract ©2023 IEEE

    Portable Multicore Resource Management for Applications with Performance Constraints

    No full text
    Many modern software applications have performance requirements, like mobile and embedded systems that must keep up with sensor data, or web services that must return results to users within an acceptable latency bound. For such applications, the goal is not to run as fast as possible, but to meet their performance requirements with minimal resource usage, the key resource in most systems being energy. Heuristic solutions have been proposed to minimize energy under a performance constraint, but recent studies show that these approaches are not portable - heuristics that are near-optimal on one system can waste integer factors of energy on others. The POET library and runtime system provides a portable method for resource management that achieves near-optimal energy consumption while meeting soft real-time constraints across a range of devices. Although POET was originally designed and tested on embedded and mobile platforms, in this paper we evaluate it on a manycore server-class system. The larger scale of manycore systems adds some overhead to adjusting resource allocations, but POET still meets timing constraints and achieves near-optimal energy consumption. We demonstrate that POET achieves portable energy efficiency on platforms ranging from low-power ARM big.LITTLE architectures to powerful x86 server-class systems

    Minimizing energy under performance constraints on embedded platforms

    No full text

    POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints

    No full text
    Embedded real-time systems must meet timing constraints while minimizing energy consumption. To this end, many energy optimizations are introduced for specific platforms or specific applications. These solutions are not portable, however, and when the application or the platform change, these solutions must be redesigned. Portable techniques are hard to develop due to the varying tradeoffs experienced with different application/platform configurations. This paper addresses the problem of finding and exploiting general tradeoffs, using control theory and mathematical optimization to achieve energy minimization under soft real-time application constraints. The paper presents POET, an open-source C library and runtime system that takes a specification of the platform resources and optimizes the application execution. We test POET's ability to deliver portable energy reduction on two embedded systems with different tradeoff spaces - the first with a mobile Intel Haswell processor, and the second with an ARM big.LITTLE System on Chip. POET achieves the desired latency goals with small error while consuming, on average, only 1.3% more energy than the dynamic optimal oracle on the Haswell and 2.9% more on the ARM. We believe this open-source, library-based approach to resource management will simplify the process of writing portable, energy-efficient code for embedded systems

    Timely Wildfire Perimeter Mapping for Unmanned Aerial Platforms

    No full text
    Wildfire perimeter mapping currently relies on deferred processing of data from manned and orbital platforms using hand-tuned physics-based models. We demonstrate real-time on-board multispectral data processing on cost-efficient unmanned aerial platforms using ML-based semantic segmentation

    Partnership for Research on Ebola VACcination (PREVAC): protocol of a randomized, double-blind, placebo-controlled phase 2 clinical trial evaluating three vaccine strategies against Ebola in healthy volunteers in four West African countries

    No full text
    International audienceAbstract Introduction The Ebola virus disease (EVD) outbreak in 2014–2016 in West Africa was the largest on record and provided an opportunity for large clinical trials and accelerated efforts to develop an effective and safe preventative vaccine. Multiple questions regarding the safety, immunogenicity, and efficacy of EVD vaccines remain unanswered. To address these gaps in the evidence base, the Partnership for Research on Ebola Vaccines (PREVAC) trial was designed. This paper describes the design, methods, and baseline results of the PREVAC trial and discusses challenges that led to different protocol amendments. Methods This is a randomized, double-blind, placebo-controlled phase 2 clinical trial of three vaccine strategies against the Ebola virus in healthy volunteers 1 year of age and above. The three vaccine strategies being studied are the rVSVΔG-ZEBOV-GP vaccine, with and without a booster dose at 56 days, and the Ad26.ZEBOV,MVA-FN-Filo vaccine regimen with Ad26.ZEBOV given as the first dose and the MVA-FN-Filo vaccination given 56 days later. There have been 4 versions of the protocol with those enrolled in Version 4.0 comprising the primary analysis cohort. The primary endpoint is based on the antibody titer against the Ebola virus surface glycoprotein measured 12 months following the final injection. Results From April 2017 to December 2018, a total of 5002 volunteers were screened and 4789 enrolled. Participants were enrolled at 6 sites in four countries (Guinea, Liberia, Sierra Leone, and Mali). Of the 4789 participants, 2560 (53%) were adults and 2229 (47%) were children. Those < 18 years of age included 549 (12%) aged 1 to 4 years, 750 (16%) 5 to 11 years, and 930 (19%) aged 12–17 years. At baseline, the median (25th, 75th percentile) antibody titer to Ebola virus glycoprotein for 1090 participants was 72 (50, 116) EU/mL. Discussion The PREVAC trial is evaluating—placebo-controlled—two promising Ebola candidate vaccines in advanced stages of development. The results will address unanswered questions related to short- and long-term safety and immunogenicity for three vaccine strategies in adults and children. Trial registration ClinicalTrials.gov NCT02876328 . Registered on 23 August 2016
    corecore